Morpho-Syntactic Descriptions in MULTEXT-East - the Case of Serbian
نویسندگان
چکیده
Cvetana Krstev,∗ Duško Vitas† and Tomaž Erjavec‡ ∗Faculty of Philology, University of Belgrade Studentski trg 3, 11000 Begrade Serbia and Montenegro [email protected] †Faculty of Mathematics, University of Belgrade Studentski trg 16, 11000 Begrade Serbia and Montenegro [email protected] ‡Department of Knowledge Technologies Jožef Stefan Institute Jamova 39, 1000 Ljubljana Slovenia [email protected]
منابع مشابه
MULTEXT-East Resources for Serbian
The paper presents the MULTEXT-East language resources for the Serbian language. MULTEXT-East is a multilingual dataset for language engineering research and development. This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes the EAGLES-based morphosyntactic specifications, defining the features that describe wordlevel s...
متن کاملUsing a Large Set of EAGLES-compliant Morpho-Syntactic Descriptors as a Tagset for Probabilistic Tagging
The paper presents one way of reconciling data sparseness with the requirement of high accuracy tagging in terms of fine-grained tagsets. For lexicon encoding, EAGLES elaborated a set of recommendations aimed at covering multilingual requirements and therefore resulted in a large number of features and possible values. Such an encoding, used for tagging purposes, would lead to very large tagset...
متن کاملMULTEXT-East Version 3: Multilingual Morphosyntactic Specifications, Lexicons and Corpora
The paper presents the third edition of the MULTEXT-East language resources, a multilingual dataset for language engineering research and development. This standardised and linked set of resources covers a large number of mainly Central and Eastern European languages and includes the EAGLES-based morphosyntactic specifications, defining the features that describe word-level syntactic annotation...
متن کاملA Description of Morphological Features of Serbian: a Revision using Feature System Declaration
In this paper we discuss some well-known morphological descriptions used in various projects and applications (most notably MULTEXT-East and Unitex) and illustrate the encountered problems on Serbian. We have spotted four groups of problems: the lack of a value for an existing category, the lack of a category, the interdependence of values and categories lacking some description, and the lack o...
متن کاملMorpho-syntactic Clues for Terminological Processing in Serbian
In this paper we discuss morpho-syntactic clues that can be used to facilitate terminological processing in Serbian. A method (called SRCE) for automatic extraction of multiword terms is presented. The approach incorporates a set of generic morpho-syntactic filters for recognition of term candidates, a method for conflation of morphological variants and a module for foreign word recognition. Mo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Informatica (Slovenia)
دوره 28 شماره
صفحات -
تاریخ انتشار 2004